Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/roachprod: ensure wipe really wipes #41552

Merged
merged 3 commits into from
Oct 13, 2019

Conversation

petermattis
Copy link
Collaborator

Previously, roachprod wipe was being run by the default user (now
ubuntu) which meant it couldn't remove files created by root. It was also
accidentally swallowing errors. Now roachprod wipe runs as root when
wiping remote nodes, ensuring it really cleans up any files.

The failure to remove root owned files was causing the disk-stall tests
to be flaky because files created through charybdefs are created as root.
These root owned files could be removed, but cockroach also creates a root
owned directory and the files in that root owned directory could not be
removed. If two of the disk-stall tests ran on the same cluster, the
first could end up leaving around files that the second encountered and
complained about.

Fixes #41530

Release note: None

Noticed while debugging the `disk-stall` roachtest. Forcing 0755
permissions on the `cockroach-temp*` directory is done for consistency
with every other directory created by cockroach.

Release note: None
Previously, `roachprod wipe` was being run by the default user (now
`ubuntu`) which meant it couldn't remove files created by root. It was
also accidentally swallowing errors. Now `roachprod wipe` runs as root
when wiping remote nodes, ensuring it really cleans up any files.

The failure to remove root owned files was causing the `disk-stall`
tests to be flaky because files created through `charybdefs` are created
as root. These root owned files could be removed, but cockroach also
creates a root owned directory and the files in that root owned
directory could not be removed. If two of the `disk-stall` tests ran on
the same cluster, the first could end up leaving around files that the
second encountered and complained about.

Fixes cockroachdb#41530

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis petermattis requested review from ajwerner and tbg October 12, 2019 17:23
@petermattis
Copy link
Collaborator Author

I ran roachtest run -c peter-test --parallelism=1 --count=10 disk-stalled which runs the disk-stall tests sequentially on a single cluster. Prior to the roachprod wipe fix this would fail every time. Now it succeeds.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

@petermattis
Copy link
Collaborator Author

TFTR!

bors r=tbg

craig bot pushed a commit that referenced this pull request Oct 13, 2019
41552: cmd/roachprod: ensure wipe really wipes r=tbg a=petermattis

Previously, `roachprod wipe` was being run by the default user (now
`ubuntu`) which meant it couldn't remove files created by root. It was also
accidentally swallowing errors. Now `roachprod wipe` runs as root when
wiping remote nodes, ensuring it really cleans up any files.

The failure to remove root owned files was causing the `disk-stall` tests
to be flaky because files created through `charybdefs` are created as root.
These root owned files could be removed, but cockroach also creates a root
owned directory and the files in that root owned directory could not be
removed. If two of the `disk-stall` tests ran on the same cluster, the
first could end up leaving around files that the second encountered and
complained about.

Fixes #41530

Release note: None

Co-authored-by: Peter Mattis <[email protected]>
@craig
Copy link
Contributor

craig bot commented Oct 13, 2019

Build succeeded

@craig craig bot merged commit ceda2df into cockroachdb:master Oct 13, 2019
@petermattis petermattis deleted the pmattis/roachtest-disk-stalled branch October 13, 2019 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: disk-stalled/log=true,data=false failed
3 participants